Towards a New Image-Based Spectrogram Segmentation Speech Coder Optimised for Intelligibility

نویسندگان

Keith A. Jellyman

Nicholas W. D. Evans

Wei-Ming Liu

John S. D. Mason

چکیده

Speech intelligibility is the very essence of communications. When high noise can degrade a speech signal to the threshold of intelligibility, for example in mobile and military applications, introducing further degradation by a speech coder could prove critical. This paper investigates concepts towards a new speech coder that draws upon the field of image processing in a new multimedia approach. The coder is based on a spectrogram segmentation image processing procedure. The design criterion is for minimal intelligibility loss in high noise, as opposed to the conventional quality criterion, and the bit rate must be reasonable. First phase intelligibility listening test results assessing its potential alongside six standard coders are reported. Experimental results show the robustness of the LD-CELP coder, and the potential of the new coder with particularly good results in car noise conditions below -4.0dB.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cipher text only attack on speech time scrambling systems using correction of audio spectrogram

Recently permutation multimedia ciphers were broken in a chosen-plaintext scenario. That attack models a very resourceful adversary which may not always be the case. To show insecurity of these ciphers, we present a cipher-text only attack on speech permutation ciphers. We show inherent redundancies of speech can pave the path for a successful cipher-text only attack. To that end, regularities ...

متن کامل

Classification of emotional speech using spectral pattern features

Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...

متن کامل

Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification

Improving speech system performance in noisy environments remains a challenging task, and speech enhancement (SE) is one of the effective techniques to solve the problem. Motivated by the promising results of generative adversarial networks (GANs) in a variety of image processing tasks, we explore the potential of conditional GANs (cGANs) for SE, and in particular, we make use of the image proc...

متن کامل

Segmentation based coding of images

This article presents a complete still image coder for gray scale images. The coder is based on segmenting the image into homogeneous objects which are coded independently. The coder consist of three parts: segmentation, edge coding and texture coding. The segmentation is done by using mathematical morphology. The segments are then coded by representing the edges between segments by a chain cod...

متن کامل

A soft decision MMSE amplitude estimator as a noise preprocessor to speech coder s using a glottal sensor

A soft-decision Ephraim-Malah suppression rule based speech enhancement algorithm is proposed for intelligibility enhancement in parametric speech coders. A glottal sensor is used to improve the intelligibility of a baseline system that uses only the acoustic microphone. The objective measure test shows that the proposed system decreases the spectral distortion by 2-3 dB for most phonetic class...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Towards a New Image-Based Spectrogram Segmentation Speech Coder Optimised for Intelligibility

نویسندگان

چکیده

منابع مشابه

Cipher text only attack on speech time scrambling systems using correction of audio spectrogram

Classification of emotional speech using spectral pattern features

Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification

Segmentation based coding of images

A soft decision MMSE amplitude estimator as a noise preprocessor to speech coder s using a glottal sensor

عنوان ژورنال:

اشتراک گذاری